Search CORE

88 research outputs found

Classification and Target Group Selection Based Upon Frequent Patterns

Author: Pijls W.H.L.M.
Potharst R.
Publication venue
Publication date
Field of study

In this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is constructed. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The classification method utilizes directly this collection. Target group selection is a known problem in direct marketing. Our selection algorithm is based upon the collection of frequent patterns.classification;association rules;frequent item sets;target group selection

Research Papers in Economics

Dilworth's Theorem Revisited, an Algorithmic Proof

Author: Pijls W.H.L.M.
Potharst R.
Publication venue
Publication date
Field of study

Dilworth's theorem establishes a link between a minimal path cover and a maximal antichain in a digraph.A new proof for Dilworth's theorem is given. Moreover an algorithm to find both the path cover and the antichain, as considered in the theorem, is presented.

Research Papers in Economics

Classification Trees for Problems with Monotonicity Constraints

Author: Feelders A.J.
Potharst R.
Publication venue
Publication date
Field of study

For classification problems with ordinal attributes very often theclass attribute should increase with each or some of theexplaining attributes. These are called classification problemswith monotonicity constraints. Classical decision tree algorithmssuch as CART or C4.5 generally do not produce monotone trees, evenif the dataset is completely monotone. This paper surveys themethods that have so far been proposed for generating decisiontrees that satisfy monotonicity constraints. A distinction is madebetween methods that work only for monotone datasets and methodsthat work for monotone and non-monotone datasets alike.classification tree;decision tree;monotone;monotonicity constraint;ordinal data

Research Papers in Economics

Improved customer choice predictions using ensemble methods

Author: Potharst R.
Wezel M.C. van
Publication venue
Publication date
Field of study

In this paper various ensemble learning methods from machinelearning and statistics are considered and applied to the customerchoice modeling problem. The application of ensemble learningusually improves the prediction quality of flexible models likedecision trees and thus leads to improved predictions. We giveexperimental results for two real-life marketing datasets usingdecision trees, ensemble versions of decision trees and thelogistic regression model, which is a standard approach for thisproblem. The ensemble models are found to improve upon individualdecision trees and outperform logistic regression.Next, an additive decomposition of the prediction error of amodel, the bias/variance decomposition, is considered. A modelwith a high bias lacks the flexibility to fit the data well. Ahigh variance indicates that a model is instable with respect todifferent datasets. Decision trees have a high variance componentand a low bias component in the prediction error, whereas logisticregression has a high bias component and a low variance component.It is shown that ensemble methods aim at minimizing the variancecomponent in the prediction error while leaving the bias componentunaltered. Bias/variance decompositions for all models for bothcustomer choice datasets are given to illustrate these concepts.brand choice;data mining;boosting;choice models;Bias/Variance decomposition;Bagging;CART;ensembles

Research Papers in Economics

Boosting the accuracy of hedonic pricing models

Author: Kagie M.
Potharst R.
Wezel M.C. van
Publication venue
Publication date
Field of study

Hedonic pricing models attempt to model a relationship between object attributes andthe object's price. Traditional hedonic pricing models are often parametric models that sufferfrom misspecification. In this paper we create these models by means of boosted CARTmodels. The method is explained in detail and applied to various datasets. Empirically,we find substantial reduction of errors on out-of-sample data for two out of three datasetscompared with a stepwise linear regression model. We interpret the boosted models by partialdependence plots and relative importance plots. This reveals some interesting nonlinearitiesand differences in attribute importance across the model types.pricing;marketing;data mining;conjoint analysis;ensemble learning;gradient boosting;hedonic pricing

Research Papers in Economics

Neural Networks for Target Selection in Direct Marketing

Author: Kaymak U.
Pijls W.H.L.M.
Potharst R.
Publication venue
Publication date
Field of study

Partly due to a growing interest in direct marketing, it has become an important application field for data mining. Many techniques have been applied to select the targets in commercial applications, such as statistical regression, regression trees, neural computing, fuzzy clustering and association rules. Modeling of charity donations has also recently been considered. The availability of a large number of techniques for analyzing the data may look overwhelming and ultimately unnecessary at first. However, the amount of data used in direct marketing is tremendous. Further, there are different types of data and likely strong nonlinear relations amongst different groups within the data. Therefore, it is unlikely that there will be a single method that can be used under all circumstances. For that reason, it is important to have access to a range of different target selection methods that can be used in a complementary fashion. In this respect, learning systems such as neural networks have the advantage that they can adapt to the nonlinearity in the data to capture the complex relations. This is an important motivation for applying neural networks for target selection. In this report, neural networks are applied to target selection in modeling of charity donations. Various stages of model building are described by using data from a large Dutch charity organization as a case. The results are compared with the results of more traditional methods for target selection such as logistic regression and CHAID.neural networks;data mining;direct mail;direct marketing;target selection

Research Papers in Economics

Direct Mailing Decisions for a Dutch Fundraiser

Author: Jonker J-J.
Piersma N.
Potharst R.
Publication venue
Publication date
Field of study

Direct marketing firms want to transfer their message as efficientlyas possible in order to obtain a profitable long-term relationshipwith individual customers. Much attention has been paid to addressselection of existing customers and on identifying new profitableprospects. Less attention has been paid to the optimal frequency ofthe contacts with customers. We provide a decision support system thathelps the direct mailer to determine mailing frequency for activecustomers. The system observes the mailing pattern of these customersin terms of the well known R(ecency), F(requency) and M(onetary)variables. The underlying model is based on an optimization model forthe frequency of direct mailings. The system provides the directmailer with tools to define preferred response behavior and advisesthe direct mailer on the mailing strategy that will steer thecustomers towards this preferred response behavior.decision support system;direct marketing;Markov decision process

Research Papers in Economics

Modeling brand choice using boosted and stacked neural networks

Author: Potharst R.
Rijthoven M. van
Wezel M.C. van
Publication venue
Publication date
Field of study

The brand choice problem in marketing has recently been addressed with methods from computational intelligence such as neural networks. Another class of methods from computational intelligence, the so-called ensemble methods such as boosting and stacking have never been applied to the brand choice problem, as far as we know. Ensemble methods generate a number of models for the same problem using any base method and combine the outcomes of these different models. It is well known that in many cases the predictive performance of ensemble methods significantly exceeds the predictive performance of the their base methods. In this report we use boosting and stacking of neural networks and apply this to a scanner dataset that is a benchmark dataset in the marketing literature. Using these methods, we find a significant improvement in predictive performance on this dataset.

Research Papers in Economics

Classification and Target Group Selection Based Upon Frequent Patterns

Author: Pijls W.H.L.M. (Wim)
Potharst R. (Rob)
Publication venue: Pijls, W.H.L.M. (Wim)
Publication date: 01/01/2000
Field of study

EUR Research Repository

Erasmus University Digital Repository

Repairing non-monotone ordinal data sets by changing class labels

Author: Pijls W.H.L.M. (Wim)
Potharst R. (Rob)
Publication venue: __Abstract__ Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed.
Publication date: 01/01/2014
Field of study

__Abstract__ Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed

EUR Research Repository

Erasmus University Digital Repository